Search CORE

59 research outputs found

Character-Word LSTM Language Models

Author: Pelemans Joris
Van hamme Hugo
Verwimp Lyan
Wambacq Patrick
Publication venue
Publication date: 01/01/2017
Field of study

We present a Character-Word Long Short-Term Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the model. Character information can reveal structural (dis)similarities between words and can even be used when a word is out-of-vocabulary, thus improving the modeling of infrequent and unknown words. By concatenating word and character embeddings, we achieve up to 2.77% relative improvement on English compared to a baseline model with a similar amount of parameters and 4.57% on Dutch. Moreover, we also outperform baseline word-level models with a larger number of parameters

arXiv.org e-Print Archive

Crossref

Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation

Author: Chelba Ciprian
Pelemans Joris
Shazeer Noam
Publication venue
Publication date: 26/06/2015
Field of study

We present a novel family of language model (LM) estimation techniques named Sparse Non-negative Matrix (SNM) estimation. A first set of experiments empirically evaluating it on the One Billion Word Benchmark shows that SNM

n

-gram LMs perform almost as well as the well-established Kneser-Ney (KN) models. When using skip-gram features the models are able to match the state-of-the-art recurrent neural network (RNN) LMs; combining the two modeling techniques yields the best known result on the benchmark. The computational advantages of SNM over both maximum entropy and RNN LM estimation are probably its main strength, promising an approach that has the same flexibility in combining arbitrary features effectively and yet should scale to very large amounts of data as gracefully as

n

-gram LMs do

arXiv.org e-Print Archive

CiteSeerX

STON efficient subtitling in Dutch using state-of-the-art tools

Author: Demuynck Kris
Desplanques Brecht
Lycke Mariek
Pelemans Joris
Verwimp Lyan
Wambacq Patrick
Publication venue: International Speech Communication Association (ISCA)
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

The effect of word similarity on N-gram language models in Northern and Southern Dutch

Author: Demuynck Kris
Pelemans Joris
Van Damme Hugo
Wambacq Patrick
Publication venue
Publication date: 01/01/2014
Field of study

In this paper we examine several combinations of classical N-gram language models with more advanced and well known techniques based on word similarity such as cache models and Latent Semantic Analysis. We compare the efficiency of these combined models to a model that combines N-grams with the recently proposed, state-of-the-art neural network-based continuous skip-gram. We discuss the strengths and weaknesses of each of these models, based on their predictive power of the Dutch language and find that a linear interpolation of a 3-gram, a cache model and a continuous skip-gram is capable of reducing perplexity by up to 18.63%, compared to a 3-gram baseline. This is three times the reduction achieved with a 5-gram. In addition, we investigate whether and in what way the effect of Southern Dutch training material on these combined models differs when evaluated on Northern and Southern Dutch material. Experiments on Dutch newspaper and magazine material suggest that N-grams are mostly influenced by the register and not so much by the language (variety) of the training material. Word similarity models on the other hand seem to perform best when they are trained on material in the same language (variety)

Ghent University Academic Bibliography

Archivsystem Ask23

A Comparison of Different Punctuation Prediction Approaches in a Translation Context

Author: Pelemans Joris
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: European Association for Machine Translation
Publication date: 01/01/2018
Field of study

We test a series of techniques to predict punctuation and its effect on machine translation (MT) quality. Several techniques for punctuation prediction are compared: language modeling techniques, such as n-grams and long short-term memories (LSTM), sequence labeling LSTMs (unidirectional and bidirectional), and monolingual phrase-based, hierarchical and neural MT. For actual translation, phrase-based, hierarchical and neural MT are investigated. We observe that for punctuation prediction, phrase-based statistical MT and neural MT reach similar results, and are best used as a preprocessing step which is followed by neural MT to perform the actual translation. Implicit punctuation insertion by a dedicated neural MT system, trained on unpunctuated source and punctuated target, yields similar results.This research was done in the context of the SCATE project, funded by the Flemish Agency for Innovation and Entrepreneurship (IWT project 13007)

Repositorio Institucional de la Universidad de Alicante

Improving the translation environment for professional translators

Author: Augustinus Liesbeth
Bulté Bram
Buysschaert Joost
Coppers Sven
Daems Joke
Heyman Geert
Hoste Veronique
Lefever Els
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
van der Lek-Ciudin Iulianna
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography

Smart computer-aided translation environment (SCATE) : highlights

Author: Augustinus Liesbeth
Brulmans Jens
Bulté Bram
Coninx Karin
Coppers Sven
Heyman Geert
Lefever Els
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
van der Lek-Ciudin Iulianna
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

Smart Computer-Aided Translation Environment (SCATE): Highlights

Author: Augustinus Liesbeth
Brulmans Jens
Bulté Bram
Coninx Karin
Coppers Sven
Heyman Geert
Lefever Els
Lek-Ciudin Iulianna van der
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: European Association for Machine Translation
Publication date: 01/01/2018
Field of study

We present the highlights of the now finished 4-year SCATE project. It was completed in February 2018 and funded by the Flemish Government IWT-SBO, project No. 130041

Repositorio Institucional de la Universidad de Alicante

Ghent University Academic Bibliography